Sains Malaysiana 53(4)(2024):
935-951
http://doi.org/10.17576/jsm-2024-5304-16
Detection of Outliers in Circular Regression
Model via DFBETAcIS Statistic
(Pengesanan Outlier dalam Model Regresi Bulat melalui Statistik DFBETAcIS
)
INTAN
MASTURA RAMLEE1,2, SAFWATI IBRAHIM1,2,*, LEOW WAI ZHE3 & MOHD IRWAN YUSOFF3
1Institute of Engineering Mathematics, Universiti Malaysia Perlis, Pauh Putra Main Campus, 02600 Arau, Perlis, Malaysia
2Centre of Excellence for Social Innovation
and Sustainability (COESIS), Universiti Malaysia Perlis, 02600 Arau,
Perlis, Malaysia
3Faculty of Electrical & Technology
Engineering, Universiti Malaysia Perlis, Pauh Putra Main Campus, 02000 Arau, Perlis, Malaysia
Diserahkan: 4 Julai 2023/Diterima: 1 Mac 2024
Abstract
The outlier issues in circular regression models
have recently received much attention. The presence of outliers may cause the
sign and magnitude of regression coefficients to vary, resulting in inaccurate
model development and incorrect prediction. Many methods for detecting outliers
in a circular regression model have been proposed in previous studies such as COVRATIO, D, M, A, and Chord statistics, but it is suspected
that they are not very successful in the presence of multiple outliers in a
data set since the masking and swamping is not considered in their studies.
This study aimed to develop an outlier detection procedure using DFBETAc
statistic for circular
cases, where this new statistic will investigate and identify multiple outliers
in the Jammalamadaka and Sarma circular regression model (JSCRM) by considering masking and swamping effect.
Monte Carlo simulations are used to determine the corresponding cut-off point
and the power of performance is investigated. The performance of the proposed
statistic is evaluated by the proportion of detected outliers and the rate of
masking and swamping. The simulation procedure is applied at 10% and 20%
contamination levels for varying sample sizes. The results show that the
proposed DFBETAcIS
statistic for JSCRM
successfully detect the outliers. For illustration purposes, this process is
applied to wind direction data.
Keywords: Circular regression model; DFBETAc; outlier
Abstrak
Isu data terpencil dalam model regresi bulat baru-baru ini banyak mendapat perhatian. Kehadiran data terpencil boleh menyebabkan tanda dan magnitud pekali regresi berubah, mengakibatkan pembangunan model
yang tidak tepat dan ramalan yang salah. Banyak kaedah untuk mengesan data terpencil dalam model regresi bulat telah dicadangkan dalam kajian sebelum ini seperti statistik COVRATIO, D, M, A dan Chord tetapi dipercayai bahawa kaedah tersebut tidak begitu berjaya dengan kehadiran berbilang data terpencil dalam set data kerana litupan dan limpahan tidak diambil kira dalam kajian mereka. Kajian ini bertujuan untuk membangunkan prosedur pengesanan data terpencil menggunakan statistik DFBETAc
untuk kes bulatan dengan statistik baharu ini akan mengkaji dan mengenal pasti berbilang data terpencil dalam model regresi bulat Jammalamadaka dan Sarma (JSCRM) dengan mengambil kira kesan litupan dan limpahan. Simulasi Monte Carlo digunakan untuk menentukan titik potong yang sepadan dan kuasa prestasi dikaji. Prestasi statistik yang dicadangkan dinilai oleh perkadaran data terpencil yang dikesan dan kadar litupan dan limpahan. Prosedur simulasi digunakan pada tahap pencemaran 10% dan 20% untuk sampel saiz yang berbeza. Keputusan menunjukkan statistik
DFBETAcIS yang dicadangkan untuk JSCRM berjaya mengesan data terpencil. Untuk tujuan ilustrasi, proses ini digunakan pada data arah angin.
Kata kunci: Data terpencil; DFBETAc;
model regresi bulat
RUJUKAN
Abuzaid, A.H. 2020. Detection of outliers in univariate
circular data by means of the outlier local factor (LOF). Statistics in
Transition New Series 21(3): 39-51.
Abuzaid, A.H.
2010. Some problems of outliers in circular data. Doctoral
dissertation, University of Malaya.
Abuzaid, A.H., Mohamed, I.B. & Hussin,
A.G. 2009. A new test of discordancy in circular data. Communications
in Statistics-Simulation and Computation 38: 682-691.
Abuzaid, A.H., Hussin, A.G., Rambli, A. & Mohamed, I. 2012. Statistics for a new test
of discordance in circular data. Communications in
Statistics-Simulation and Computation 41: 1882-1890.
Alkasadi, N.A., Ibrahim, S., Abuzaid,
A. & Yusoff, M.I. 2019. Outlier detection in
multiple circular regression model using DFFITC statistic. Sains Malaysiana 48(7):
1557-1563.
Alkasadi, N.A., Abuzaid, A.H.,
Ibrahim, S. & Yusoff, M.I. 2018. Outliers
detection in multiple circular regression model via DFBETAc statistic. International Journal of Applied Engineering Research 13: 9083-9090.
Alkasadi, N.A., Ibrahim, S., Ramli,
M.F. & Yusoff, M.I. 2016. A comparative study of
outlier detection procedures in multiple circular regression. In AIP
Conference Proceedings 1775: 030032.
Barnett,
V. & Lewis, T. 1994. Outliers in Statistical Data. New York: John
Wiley and Sons.
Belsley, D.A., Kuh, E. & Welsch, R.E. 1980. Regression
Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons.
Binkley,
S.A. 1990. The Clockwork Sparrow: Time, Clocks, and Calendars in
Biological Organisms. New Jersey: Prentice Hall.
Chambers,
R., Hentges, A. & Zhao, X. 2004. Robust automatic
methods for outlier and error detection. Journal of the Royal
Statistical Society: Series A (Statistics in Society) 167: 323-339.
Chatterjee,
S. & Hadi, A.S. 1988. Impact of simultaneous
omission of a variable and an observation on a linear regression
equation. Computational Statistics & Data Analysis 6: 129-144.
Collett, D. 1980. Outlier in circular data. Journal
of the Royal Statistical Society Series C: Applied Statistics 29(1): 50-57.
Cook,
R.D. 1977. Detection of influential observation in linear regression. Technometrics 19:1 5-18.
Cousineau, D. & Chartier,
S. 2010. Outliers detection and treatment: A review. International
Journal of Psychological Research 3: 58-67.
Downs,
T. 1974. Rotational angular correlation.
In Biorhythms and Human Reproduction, edited by Ferin,
M., Halberg, F. & van der Wiele,
L. New York: Wiley. pp. 97-104.
Fisher,
N.I. & Lee, A.J. 1992. Regression models for an angular response. Biometrics 48(3): 665-677.
Follmann, D.A. & Proschan,
M.A. 1999. A simple permutation‐type method for testing circular
uniformity with correlated angular measurements. Biometrics 55(3): 782-791.
Gould,
A.L. 1969. A regression technique for angular variates. Biometrics 25(4): 683-700.
Hrushesky, W.J.M. 1985. Circadian timing of cancer
chemotherapy. Science 228: 73-75.
Hussin, A.G., Fieller,
N.R.J. & Stillman, E.C. 2004. Linear regression
model for circular variables with application to directional data. Journal
of Applied Science and Technology 9(1 & 2): 1-6.
Ibrahim,
S. 2013. Some outlier problems in
a circular regression model. Doctoral dissertation, Fakulti Sains, Universiti Malaya.
Ibrahim,
S., Rambli, A., Hussin,
A.G. & Mohamed, I. 2013. Outlier detection in a circular regression model
using COVRATIO statistic. Communications in
Statistics-Simulation and Computation 42(10): 2272-2280.
Jammalamadaka, S. & Sarma, Y.
1993. Circular regression. Statistical Science and Data Analysis 34: 109-128.
Jha, J., Biswas, A. & Cheng, T.C. 2022. Trimmed
estimator for circular–circular regression: Breakdown properties and an exact
algorithm for computation. Statistics
56(2): 375-395.
Johnson,
R.A. & Wehrly, T.E. 1978. Some angular-linear
distributions and related regression models. Journal of the American
Statistical Association 73:
602-606.
Jones,
M.C. & Silverman, B.W. 1989. An orthogonal series density estimation
approach to reconstructing positron emission tomography images. Journal
of Applied Statistics 16: 177-191.
Lowrey, P.L., Shimomura, K., Antoch,
M.P., Yamazaki, S., Zemenides, P.D., Ralph, M.R., Menaker, M. & Takahashi, J.S. 2000. Positional syntenic cloning and functional characterization of the
mammalian circadian mutation tau. Science 288(5465): 483-492.
Lund,
U. 1999. Least circular distance regression for directional data. Journal
of Applied Statistics 26: 723-733.
Mackenzie,
J.K. 1957. The estimation of an orientation relationship. Acta Crystallographica 10: 61-62.
Mardia, K. 1975. Statistical of directional data (with
discussion). Journal of the Royal Statistical Society 37: 390.
Meilán-Vila, A., Crujeiras,
R.M. & Francisco-Fernández, M. 2021.
Nonparametric estimation of circular trend surfaces with application to wave
directions. Stochastic Environmental Research and Risk Assessment 35(4):
923-939.
Mohamed,
I.B., Rambli, A., Khaliddin,
N. & Ibrahim, A.I.N. 2016. A new discordancy test in circular data using spacings theory. Communications in Statistics-Simulation
and Computation 45: 2904-2916.
Mokhtar,
N.A., Zubairi, Y.Z., Hussin,
A.G. & Moslim, N.H. 2019. An outlier detection
method for circular linear functional relationship model using covratio statistics. Malaysian Journal of
Science 38(Special Issue 2): 46-54.
Moore-Ede,
M.C., Sulzman, F.M. & Fuller, C.A. 1982. The Clocks that Time Us: Physiology of the
Circadian Timing System. Massachusetts: Harvard University Press.
Rambli, A., Yunus, R.M., Mohamed, I. & Hussin,
A.G. 2015. Outlier detection in a circular regression model. Sains Malaysiana 44(7):
1027-1032.
Rivest, L.P. 1997. A decentered predictor for
circular-circular regression. Biometrika 84: 717-726.
Rousseeuw, P.J. & Leroy, A.M. 2005. Robust Regression and Outlier Detection.
New York: John Wiley & Sons.
Shearman,
L.P., Sriram, S., Weaver, D.R., Maywood, E.S.,
Chaves, I., Zheng, B., Kume, K., Lee, C.C., van der,
G.T.J, Horst, Hastings, M.H. & Reppert, S.M.
2000. Interacting molecular loops in the mammalian circadian clock. Science 288(5468): 1013-1019.
Stephens,
M.A. 1979. Vector correlation. Biometrika 66(1): 41-48.
Weir,
I.S. & Green, P.J. 1994. Modelling data from single-photon emission
computerized tomography. Journal of Applied Statistics 21: 313-337.
*Pengarang untuk surat-menyurat;
email: safwati@unimap.edu.my
|